Multi-label Classification with Multiple Class Ontologies

نویسنده

  • Fernando Benites de Azevedo e Souza
چکیده

The term “big data” is now becoming more and more important in many fields. This data should not only be gathered, but also analyzed and, in some cases, classified. The categorization of each sample is becoming increasingly multifaceted, since it often means to assign not only one category from one ontology but multiple labels from multiple ontologies. This study investigates the improvement of classification performance in such multi-label problems with the help of association rule mining. The innovative character of the study lies in the use of an extended neural classifier based on adaptive resonance theory-networks, as well as of rare association rule mining to extract useful knowledge from classification data. The central hypothesis of the study is that the discovery of deep connections between multi-labels of different taxonomies improves the prediction of the classifier system and allows the extraction of interesting knowledge. This is based on the fact that classifiers can learn well labels that are high in the ontology, since these labels have many examples. The deeper the labels, the fewer samples the classifier will have for training, and other methods need to be developed to cope with this difficulty. On the other hand, some taxonomies are easier to learn and their predictions, together with association rules, can help to increase the prediction quality of the system. Further, since the classification system in the big data setup will become increasingly complex, the interaction of experts with the system should allow to identify conflicts in the classification rules and to correct them. To formalize the problem, we seek to create a classification model that maps objects from a description space into a set of classes given by multiple taxonomies (taxonomy space). Here, we prefer models that provide sensible and understandable rules, as they allow examining and verifying the knowledge that they extract. We assume that the taxonomies’ spaces are so complex that for the practical use, each taxonomy is organized as a tree. We expect the most interesting connections between taxonomies to be found in the cases when the taxonomies are very different, both conceptually and in nature (including those mapped by different structures). In such scenario, we expect new, surprising knowledge to arise from the connections between the taxonomies. The main contribution of the thesis is the extensive examination of using rare association rules for the improvement of multi-label predictions in the setup of cross-ontology classification, especially the proposed approach called Multi-label Improvement with Rare Association Rules (MIRAR). A further contribution is the ML-HARAM, a hierarchical multi-label classifier based on the Adaptive Resonance Theory (ART). The last contribution is the Rule Explorer, a graphical user interface to analyze each step of the classification process in depth: from the creation of the classification rule over its examination up to the use with interestingness measures to improve the prediction quality of the classifier and its predicted labels. Extensive experiments indicate, under statistical significance, that the results support well the hypothesis and, with common multi-label classification measures, the results achieved were over state-of-the-art performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

Using Semantic Data Mining for Classification Improvement and Knowledge Extraction

The objective of this position paper is to show that the integration of semantic data mining into the DAMIART data mining system can help further improve classification performance and knowledge extraction. DAMIART performs multi-label classification in the presence of multiple class ontologies, hierarchy extraction from multi-labels and concept relation by association rule mining. Whereas DAMI...

متن کامل

Improving Multi-Label Classification by Means of Cross-Ontology Association Rules

Recently several methods were proposed for the improvement of multi-label classification performance by using constraints on labels. Such constraints are based on dependencies between classes often present in multi-label data and can be mined as association rules from training data. The rules are then applied in a post-processing step to correct the classifier predictions. Due to properties of ...

متن کامل

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...

متن کامل

Multi-label classification and extracting predicted class hierarchies

This paper investigates hierarchy extraction from results of multi-label classification (MC). MC deals with instances labeled by multiple classes rather than just one, and the classes are often hierarchically organized. Usually multi-label classifiers rely on a predefined class hierarchy. A much less investigated approach is to suppose that the hierarchy is unknown and to infer it automatically...

متن کامل

Labelling strategies for hierarchical multi-label classification techniques

Many hierarchical multi-label classification systems predict a real valued score for every (instance, class) couple, with a higher score reflecting more confidence that the instance belongs to that class. These classifiers leave the conversion of these scores to an actual label set to the user, who applies a cut-off value to the scores. The predictive performance of these classifiers is usually...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017